Taming Non-stationary Bandits: A Bayesian Approach

نویسندگان

  • Vishnu Raj
  • Sheetal Kalyani
چکیده

We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the probability of picking sub-optimal arms. By increasing the exploitative value of Bayes’ samples, we also provide an optimistic version of the algorithm. Extensive empirical analysis is conducted under various scenarios to validate the utility of proposed algorithms. A comparison study with various state-of-the-arm algorithms is also included.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Online Marketing Experiments with Drifting Multi-armed Bandits

Restless bandits model the exploration vs. exploitation trade-off in a changing (non-stationary) world. Restless bandits have been studied in both the context of continuously-changing (drifting) and change-point (sudden) restlessness. In this work, we study specific classes of drifting restless bandits selected for their relevance to modelling an online website optimization process. The contrib...

متن کامل

On Bayesian Upper Confidence Bounds for Bandit Problems

Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of perf...

متن کامل

Stochastic Bandits with Pathwise Constraints

We consider the problem of stochastic bandits, with the goal of maximizing a reward while satisfying pathwise constraints. The motivation for this problem comes from cognitive radio networks, in which agents need to choose between different transmission profiles to maximize throughput under certain operational constraints such as limited average power. Stochastic bandits serve as a natural mode...

متن کامل

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret to the best fixed policy–a questionable benchmark for non-stationary environments ubiquitous in applications. In this work, we obtain efficient contextual bandit algorithms with strong guarantees for alternate notions of regret suited to these non-stationary environments. Two of our algorithms equip existing methods for i.i.d problems with sophi...

متن کامل

A Bayesian approach to a dynamic inventory model under an unknown demand distribution

In this paper, the Bayesian approach to demand estimation is outlined for the cases of stationary as well as non-stationary demand. The optimal policy is derived for an inventory model that allows stock disposal, and is shown to be the solution of a dynamic programming backward recursion. Then, a method is given to search for the optimal order level around the myopic order level. Finally, a num...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.09727  شماره 

صفحات  -

تاریخ انتشار 2017